feat(compact): add FieldListaggAgg aggregate function#224
feat(compact): add FieldListaggAgg aggregate function#224Zouxxyy wants to merge 10 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new listagg field aggregate for MergeTree compaction to concatenate string values with configurable delimiter support, including an optional distinct/dedup mode.
Changes:
- Introduce
FieldListaggAggaggregator and wire it intoFieldAggregatorFactory. - Add new field-level options for
list-agg-delimiteranddistinctviaCoreOptions/Options. - Add unit tests covering delimiter/null/empty/distinct/type-validation scenarios.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h | Implements listagg aggregation, including distinct token deduplication logic. |
| src/paimon/core/mergetree/compact/aggregate/field_aggregator_factory.h | Registers listagg in the aggregator factory. |
| src/paimon/core/core_options.h | Adds CoreOptions accessors for listagg delimiter and distinct mode. |
| src/paimon/core/core_options.cpp | Implements option parsing for delimiter and distinct mode. |
| include/paimon/defs.h | Declares new option keys (distinct, list-agg-delimiter). |
| src/paimon/common/defs.cpp | Defines the new option key constants. |
| src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp | Adds unit tests for listagg behavior and validation. |
| src/paimon/CMakeLists.txt | Adds the new test file to the test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp
Outdated
Show resolved
Hide resolved
- Reject empty delimiter when distinct is enabled to prevent
infinite loop in AggDistinctImpl (find("") always returns 0)
- Pre-reserve string capacity and use append() instead of +=
in AggDistinctImpl to avoid repeated reallocations
- Replace assert() with gtest ADD_FAILURE() in test CreateOptions()
to ensure failures are reported properly even under NDEBUG
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp
Outdated
Show resolved
Hide resolved
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
Outdated
Show resolved
Hide resolved
|
Great work! Thanks for enhancing the aggregation functionality! |
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg.h
Outdated
Show resolved
Hide resolved
src/paimon/core/mergetree/compact/aggregate/field_listagg_agg_test.cpp
Outdated
Show resolved
Hide resolved
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
By the way, your change helped us identify a bug in the Java implementation: We're preparing a PR today to fix this on the Java side — thanks for catching this discrepancy! |
Summary
Add
listaggaggregate function for MergeTree compaction. Concatenatesstring values with a configurable delimiter, with optional distinct mode
to deduplicate tokens.
Test plan
Unit tests cover null handling, empty strings, custom delimiters,
distinct dedup, and type validation.
🤖 Generated with [Claude Code]